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Activation of Voice-Controlled Apparatus 

Field of the Invention 

5 The present invention relates to the activation of voice-controlled apparatus. 

Background of the Invention 

Voice control of apparatus is becoming more common and there are now well developed 
technologies for speech recognition particularly in contexts that only require small 
10 vocabularies. 

However, a problem exists where there are multiple voice-controlled apparatus in close 
proximity since their vocabularies are likely to overlap giving rise to the possibility of 
several different pieces of apparatus responding to the same voice command. 

15 

It is known from US 5,991,726 to provide a proximity sensor on a piece of voice- 
controlled industrial machinery or equipment. Activation of the machinery or equipment by 
voice can only be effected if a person is standing nearby. However, pieces of industrial 
machinery or equipment of the type being considered are generally not closely packed so 
20 that whilst the proximity sensor has the effect of making voice control specific to the item 
concerned in that context, the same would not be true for voice controlled kitchen 
appliances as in the latter case the detection zones of the proximity sensors are likely to 
overlap. 

25 One way of overcoming the problem of voice control activating multiple pieces of 
apparatus, is to require each voice command to be immediately preceded by speaking the 
name of the specific apparatus it is wished to control so that only that apparatus takes 
notice of the following command. This approach is not, however, user friendly and users 
frequently forget to follow such a command protocol, particularly when in a hurry. 

30 



^s^objec.oftepresen.taven.ion.oprovidea^user-aend.ywayof— Utog 
tan^of^.edactiva.ionofmuldp.evoica.con^.edapparamsby.he^eveAa! 



command. 



5 Snmmn' " f fh<> Invention 

ZZZ - *e prevention, mere is provided a method of acUvahng vo,ce- 

controlled apparatus, comprising the steps of : 

(a) - detecting when the user is looking towards the apparatus; 

(b) - detecting when the user is speaking to the apparatus; and 

is simultaneously looking towards the apparatus and speaking to it. 
^epresentinventionalsoencompassesasystem and apparatus embodying the foregoing 
method of the invention. 
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Brief Description »f the Drawings 

20 to the accompanying diagrammatic drawings, in which: 

. Figure 1 is a diagram Uniting a room equipped with camera-e<pupped vo.ce- 
controlled devices; 

.F1gure2 iaadiagramiUustrntrngacamera-eomppedroomforcontroUmgacdvadon 
of voice-controlled devices in the room; 
25 Fignr.3 is a diagram IMA, a room in which mere is a user with a he^d- 
m ountedc^e ra arrangemen.forconhollingactivattonofvoice^onm,lled 

devices in the room; and 
^ure 4 is a diagram illustraung a room in which mere is a user with a head- 
mounted infrared pointer for controlling activation of voice-controlled 
3 q devices in the room. 



R<»st Mode of Carrying Out th e Invention 

Figure 1 shows a work space 1 1 in which a user 1 0 is present. Within the space 1 1 are three 
voice-controlled devices 14 (hereinafter referred to as devices A, B and C respectively) 
each with different functionality but each provided with a similar user interface subsystem 
5 15 permitting voice control of the device by the user. 

More particularly, and with reference to device C, the user-interface subsystem comprises: 

- a camera 20 feeding an image processing unit 16 that, when enabled, is operative to 
analyse the image provided by the camera to detect any human face in the image and 

10 determine whether the face image is a full frontal image indicating that the user is 

looking towards the camera and thus towards the device C. The visibility of both 
eyes can be used to determine whether the face image is a frontal one. 

- a microphone 21 feeding a speech recognition unit 23 which, when enabled, is 
operative to recognise a small vocabulary of command words associated with the 

15 device and output corresponding command signals to a control block 26 for 

controlling the main functionality of the device itself (the control block can also 
receive input from other types of input controls such as mechanical switches so as to 
provide an alternative to the voice-controlled interface 15). 

- a sound-detection unit 27 fed by microphone 21 which triggers activation of the 
20 image processing unit upon detecting a sound (this need not be a speech sound). 

Once triggered, the image processing unit will initiate and complete an analyse of the 
current image from camera 20. 

- an activation control block 18 controlling activation of the speech recognition unit 23 
in dependence on the output of the image processing unit 24; in particular, when the 

25 image processing unit indicates that the user is looking at the device, the block 18 

enables the speech recognition unit. 
Since the image processing unit 24 will take a finite time to analyse the camera image, a 
corresponding duration of the most recent output from the microphone is buffered so that 
what a user says when first looking towards the device is available to the speech recogniser 

30 when the latter is enabled as a consequence of the image processing unit producing a 
delayed indication that the user is looking towards the device. An alternative approach is to 



have the image processing unit operating continual* («ha, is, the e.emen. 27 is omi.ted) 
wit h the mos, recent image analysis ahvays being nsed as inpu. to the achvahon control. 

, keeps the speech recogniser 23 in an inhibited state and the .atter therefore produces no 
ou ^,o,hedev i ceconh. 1 b 1 oc k 26.However,upon m e„ S er,oo M ng^*ec^ 

20 ore image processing unit detects mis and provides a corresponding tndtcatton .0 the 
aehvanon controi b.ock 25. As a consequence, block 2 5 enables me apeec , recogmUon 
mm ,o receive and in.en.re, voice commands fiom me user. This tmhal eoaMenten, ^y 
„ exists whilst the image processing unit 24 continues te indicate ma, the user , ookmg 
awards me device. On.y if me user speaks during .his initio, enah.emen, phase does the 

15 speaking. 

- - « for « famed funher period (for exa„.p.c, 5 seconds, in case ,he user wts cs ,o speak 
again ,o fire device, If fire use. starts .aJking again in mis period, fire speech recogmser 
20 m,erpre,s me inpu, and also indicates ,o block 25 tha, me use. is speaking agam; m fins 

limited period of silence allowed following speech cessahon. 

!» fids manner, me user can easUy ensure ma, only one device a, a ,ime is responsive ,o 
25 voice control. 

Since a single camera has only a limited field of vision, various measures can be taken to 
mcreasemevisua.coverageprovided.Forexample.mecameracanbefiheo wnhaw.de- 

angle lens or wi,h a scanning apparatus, or multiple cameras can be provded. 

30 

Fig me2 S ho„sanomerembc< 1 imen.wmchwml S .opem,in S mmesamegen=ramam,er. 
figure , arrangement for elfecong acfivafion conteol of voice-comroUeO devtces .4, 
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utilises a set of four fixed room cameras 28 to determine when a user is looking at a 
particular device. These cameras feed image data via LAN 29 to a device activation 
manager 30 which incorporates an image processing unit 33 for determining when a user is 
looking at a particular device having regard to the direction of looking of the user and the 
5 user's position relative to the device positions. When the unit 33 determines that the user 
is looking at a device 14 it informs a control block 34 which is responsible for informing 
the device concerned via an infrared link established between an IR transmitter 35 of the 
manager and an IR receiver 36 of the device. The manager 30 has an associated 
microphone 31 which, via sound activation control block 37, causes the image processing 
10 unit to be activated only when a sound is heard in the room. 

The devices themselves do not have a camera system or image processing means and rely 
on the manager 30 to inform them when they are being looked at by a user. Each device 14 
does, however, include control functionality that initially only enables its speech 
15 recognition unit whilst the user is looking at the device as indicated by manager 30, and 
then provided the user speaks during this initial enablement, maintains enablement of the 
speech recogniser whilst the speaking continues and for a timeout period thereafter (as for 
the Figure 1 embodiment) even if the user looks away from the device. 

20 Figure 3 shows a further camera-based embodiment in which the voice-controlled devices 
14 (only two shown) are of substantially the same form as in the Figure 2 embodiment, but 
the camera system is now a camera 5 1 mounted on a head mounting carried by the user and 
facing directly forwards to show what the user is facing towards. This camera forms part of 
user-carried equipment 50 that further comprises an image processing unit 53, control 

25 block 54, and an infrared transmitter 35 for communicating with the devices 14 via their 
infrared receivers 36. The image from the camera 5 1 is fed to the image processing unit 53 
where it is analysed to see if the user is facing towards a device 14. This analysis can be 
facilitated by providing each device with a distinctive visual symbol, preferably well lit. 
When the unit 53 determines that the user is facing a device it informs the control block 54 

30 which is then responsible for notifying the corresponding device via the IR transmitter 35. 
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knows the user is facing towards the device. 



10 ex 



; . ,„ t _ the arrangement described above. For 
Many other variants are, of course, possible to the arrange 

v, ♦ ^ c^ech recoenizer of each device can be arranged 
examole even after its enablement, the speech recognizer 

bem g enled the speech recogniser is — y inform* of when the user ,s )ook*g 
11 the device and oniy provide, an outpn, when «he key word - 

2 aeeoun, of aU words spoken after enah.enten, of the recogntser (the words spoken 
prior to the key word having heen temporarily stored). 

e*u in the soace 1 1 by any suitable technology and using a direction 
20 the position of the user in the space by ^ scope) moun ted on a 

sensor (for example, a magnetic flux compass or solid state gyros p ; 

position of the nser heing passed to a processing unit whioh then deternnnes whether the 
user is facing towards a device (in known positions). 



CLAIMS 



1. A method of activating voice-controlled apparatus, comprising the steps of : 
5 (a) - detecting when the user is looking towards the apparatus; 

(b) - detecting when the user is speaking to the apparatus; and 

(c) - enabling the apparatus for voice control only if steps (a) and (b) indicate that the user 

is simultaneously looking towards the apparatus and speaking to it. 

10 2. A method according to claim 1, wherein the apparatus, after being enabled for voice 
control, remains so enabled following cessation of step (a) only whilst step (b) is taking 
place and for a limited timeout period thereafter, recommencement of step (b) during this 
period continuing voice control with timing of the timeout period being reset. 

15 3. A method according to claim 1 , wherein the apparatus only remains enabled for voice 
control whilst step (a) is being effected. 

4. A method according to any one of the preceding claims, wherein step (a) is effected 
using a camera system mounted on the apparatus, images produced by the camera system 

20 being processed to determine if the user is looking towards the apparatus. 

5. A method according to any one of claims 1 to 3, wherein step (a) is effected using a 
camera system comprising one or more cameras mounted off the apparatus in fixed 
positions, images produced by the camera system being processed to determine if the user 

25 is looking towards the apparatus. 

6. A method according to any one of claims 1 to 3, wherein step (a) is effected using a 
camera system mounted on a user's head and arranged to point in the direction the user is 
facing or looking, images produced by the camera system being processed to determine if 

30 the user is looking towards the apparatus. 



7. A method according to anyone of claims 1 to 3, wherein step (a) is effected usmg a 
directional transmitter mounted on a user's head and arranged to point in the dtreetton the 
user is facing, me apparatus having a receiver for deteeung emissions fromthe direehonal 
transmitter. 

8 A method according to any one of claims 1 to 3, wherein step (a) is effected by 
detectmgtheposiuonof^^ 

sensing the direction of facing of the user, the output of this sensor and the position of the 
user being used to determine whether the user is facing towards a known posiuon of the 
10 apparatus. 

9 A method according to any one of the preceding claims, wherein speech recognition 
naeans of the apparatus ignores voice input from the user unless whilsttheuser is looking 
towards the apparatus, the user speaks a predetermined key word. 
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ABSTRACT 
Activation of Voice-Controlled Apparatus 

5 

A method of activating voice-controlled apparatus (14) is provided which minimises the 
risk of activating more than one such apparatus at a time where multiple voice-controlled 
apparatus exist in close proximity. To activate the apparatus, a user (10) is required both 
to be looking at the apparatus (14) and speaking at the same time. The apparatus is then 
1 0 activated, preferably only whilst the speaking continues and for a limited period thereafter. 
Detection of whether the user is looking at the apparatus can be effected in a number of 
ways including by the use of camera systems (20,24), by a head-mounted directional 
transmitter, and by detecting the location and direction of facing of the user. 
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